Partition Reduction for Lossy Data Compression 
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Abstract — We consider the computational aspects of lossy data 
compression problem, where the compression error is determined 
by a cover of the data space. We propose an algorithm which 
reduces the number of partitions needed to find the entropy with 
respect to the compression error. In particular, we show that, in 
the case of finite cover, the entropy is attained on some partition. 
We give an algorithmic construction of such partition. 

I. Introduction 

The basic description of lossy data compression consists of 
the quantization of the data space into partition and (binary) 
coding for this partition. Based on the approach of A. Renyi's 
H], and E. C. Posner et al. G|-||5), we have recently 
presented an idea of the entropy which allows to combine these 
two steps ||6). The main advantage of our description over 
classical ones is that we consider general probability spaces 
without metric. It gives us more freedom to define the error 
of coding. 

In this paper we concentrate on the calculation of the 
entropy defined in [6 |. We propose an algorithm which allows 
to reduce drastically the computational effort to perform the 
lossy data coding procedure. 

To explain precisely our results let us first introduce basic 
definitions and give their interpretations. In this paper, if 
not stated otherwise, we always assume that (X, is a 
subprobability spaceLj. As it was mentioned, the procedure of 
lossy data coding consists of the quantization of data space 
into partition and binary coding for this partition. We say that 
family V is a partition if it is countable family of measurable, 
pairwise disjoint subsets of X such that 



H(X\ |J P) = 0. 



(1) 



Pev 



During encoding we map every given point x 6 X to the 
unique P 6 V if and only if x £ P. Binary coding for 
the partition can be simply obtained by Huffman coding of 
elements of V. 

The statistical amount of information given by optimal lossy 
coding of X by elements of partition V is determined by the 

'We assume that (X, E) is a measurable space and fJ,(X) < 1. 
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entropy of V which is JTJ: 

hQ*)V) ■= 



Pev 



(2) 



is 



where sh(x) :— — xlog 2 (x), for x g (0, 1] and sh(0) 
the Shannon function. 

The coding defined by a given partition causes specific level 
of error. To control the maximal error, we fix an error-contro 1 
family Q which is just a measurable cover of X. Then we 
consider only such partitions V which are finner than Q i.e. 
we desire that for every PgP there exists Q € Q such that 
P C Q. If this is the case then we say that V is Q-acceptable 
and we write V -< Q. 

To understand better the definition of the error-control 
family let us consider the following example. 

Example 1.1. Let Q e be a family of all intervals in M with 
length e > 0. Every Q £ -acceptable partition consists of sets 
with diameter at most e. Then, after encoding determined by 
such partition, every symbol can be decoded at least with the 
precision e. The above error-control family was considered by 
A. Renyi (T| , (2) in his definition of the entropy dimension. As 
the natural extensions he also studied the error-control families 
built by all balls with given radius in general metric spaces. 
Similar approach was also used by E. C. Posner |[3]-|j5] in his 
definition of e-entropj@. 

In the case of general measures, it seems to be more 
natural to vary the lengths of intervals from the error-control 
family. Less probable events should be coded with lower 
precision (longer intervals) while more probable ones with 
higher accuracy (shorter intervals). Our approach allows to 
deal easily with such situations. 

To describe the best lossy coding determined by Q- 
acceptable partitions, we define the entropy of Q as 

H{mQ) ■= 

M{h(fi;V) E [0; oo] : V is a partition and V < Q} 

Let us observe what is the main difficulty in the application 
of this approach to the lossy data coding: 

2 E. C. Posner considered in fact (e, )-entropy which differs slightly from 
our approach. 



(3) 



Example 1.2. Let us consider the data space R and the 
error-control family Q = {(— oo, 1], [0, +00)}. In such simple 
situation there exists uncountable number of Q-acceptable 
partitions which have to be considered to find P(/i; Q). 

In this paper, we show how to reduce the aforementioned 
problem to the at most countable one. In the next section, 
we propose an algorithm which for a given partition V < Q, 
allows to construct Q-acceptable partition TZ C Sq with not 
greater entropy than V, where £q denotes the sigma algebra 
generated by Q (see Algorithm II. 1). 

As a consequence we obtain that the entropy Q) 
can be realized only by partitions V C £q (see Corollary 
IIII. lb . In the case of finite error-control families Q, we get 
an algorithmic construction of optimal Q-acceptable partition. 
More precisely, if Q is an error-control family then there exists 
k sets Qx,...,Qk G Q such that (see Corollary IIII. 31 : 

i 

HfaQ) = h(r,{Qi\\jQj}$ =1 )- ( 4 ) 

II. Algorithm for Partition Reduction 

In this section we present an algorithm which for a given Q- 
acceptable partition V constructs Q-acceptable partition TZ C 
Eg with not greater entropy. We give the detailed explanation 
that h(n;TZ) < h(fi;V). 

We first establish the notation: for a given family Q of 
subsets of X and set A C X, we denote: 

Q A = {QnA:Qe Q}. (5) 

Let Q be an error-control family on X and let V be a Q- 
acceptable partition of X. We build a family TZ according to 
the following algorithm: 

Algorithm II. 1. 

initialization 

i := 

X :=X 

7^ 
while ju(X,) > do 

Let P^ ax € Vxi be such that 

/x(P4 ax )=max{ M (P):PePx i } 

Let Ri G Qxi be an arbitrary set 
which satisfies P^ lax C Ri 

TZ = TZU{Ri} 

X i+1 :=X\(R 1 U...UR l ) 

i := i + 1 
end while 

We are going to show that Algorithm II. 1 produces the 
partition TZ with not greater entropy than V. Before that, for 
the convenience of the reader, we first recall two important 
facts, which we will refer to in further considerations. 

Observation II.l. Given numbers p > q > and r > such 
that p, q,p + r, q — r G [0, 1], we have: 

sh(p) + sh(q) > sh(p + r) + sh(q — r). (6) 



Proof: For the proof we refer the reader to [7. Section 6] 
where similar problem is illustrated for p + q = 1. ■ 

Consequence of Lebesgue Theorem (see [8|) Let 

g : N —> R be summable i.e. ^2 g(k) < 00 and {/n}^Li be 

a sequence of functions N — > R such that \f n \ < g, for 
n G N. If f n is pointwise convergent, for every n G N, then 
lim /„ is summable and 

n— >oo 

V lim f n (k) = lim V/„(fc). (7) 

^ — * n—>oo n— >oo ^ — * 
feGN k£N 

Let us move to the analysis of Algorithm II. 1 . We first check 
what happens in the single iteration of the algorithm. 

Lemma II.l. We consider an error-control family Q and a 
Q-acceptable partition V of X. Let P m ax £P be such that: 

M(-fW) = max{ A i(P) : P G V}. (8) 

If Q G Q is chosen so that P max C Q then 

h^;{Q}UV x \ Q ) <h(mV). (9) 

Proof: Clearly, if h(n; V) = 00 then the inequality (O 
holds trivially. Thus we assume that h(n;V) < 00. 

Let us observe that it is enough to consider only elements 
of V with non zero measure - the number of such sets can 
be at most countable. Thus, let us assume that V = {Pi}°Z 1 
(the case when V is finite can be treated in similar manner). 

For simplicity we put Pi :— P max . For every k G N, we 
consider the sequence of sets, defined by 

k 

Q fe :=lJ(pnQ). (10) 

i=l 

Clearly, for k G N, we have 

Qi = P, (ID 

QkCQk+i, (12) 

P n Q fc = p n Q, for i < k, (13) 

Pi n Q k = 0, for i > k, (14) 

lim fi{Q n ) = fi(Q). (15) 

n— f 00 

To complete the proof it is sufficient to derive that for every 
k G N, we have: 

h{p; {Q k } U V x \ Qk ) > hfa {Qk+i} U Px\ Qk+1 ) (16) 

and 

h(fi;{Qk}UV x \ Qk ) > h(fi;{Q}UV x \ Q ). (17) 

Let k G N be arbitrary. Then from f[T3] and (Q3), we g et 

00 

hO; {g fc } u v X \ Qk ) = sh(p{Q k )) + J2 S MM^ \ Qk)) 

i=2 

(18) 



k oo 

= sh{(i(Q k ))+Y,sHKPi\Q))+ S MM^)) < 19 ) 

i=2 i=fe+l 

= {Qk+i} U Px\Q k+ i) + sh(fi(Q k )) - sh(fj,(Qk+i)) 

(20) 



Consequently, we have 

oo 

fe(M5 {Qfc}U^x\oJ > l™ [s/ l ( M (Q„))+'V"s/i(/i(P i \Q n ))] 

N n— )-oo * — ' 

(34) 



+ sh{(i{P k+1 )) - sh{p(P k+1 \ Q)). (21) = s ft( M (Q))+^ sh{fi(Pi\Q)) = hit*] {Q}UV x \q), (35) 

i=l 

Making use of Observation III. 1 1 we obtain , . , 

which completes the proof. ■ 

sh(p(Qk)) + sh(n(P k+1 )) (22) We ^ ready to summarize the analysis of Algorithm ELI. 

We present it in the following two theorems. 

> ah(ji(Qk+i)) + sh((J,(P k+1 \ <?)), (23) Theorem ILJ Lef Q be m ermr . contm i f amUy on X and let 

which proves ([ToT l. V be a Q-acceptable partition of X. Family 1Z constructed 

To derive Q7), we first use inequality ([16). Then h the Algorithm III is a partition of X. 

oo Proof: Directly from the Algorithm ELI, we get that 1Z 

h(p; {Qk} U Vx\Q k ) — sh{p(Q k )) + sh(n(Pi \ Q k )) is countable family of pairwise disjoint sets. 

»=i Let us assume that 1Z = since the case when 1Z 

(24) J 1-1 

^ v ' is finite family is straightforward. To prove that 



> lim [sh{p{Q n )) + V sh(jjL(Pi \ Q n ))\. (25) 
n— >oo £ — * 

i=l 

By Cj3]>, 



i=l 



we will use the Consequence of Lebesgue Theorem, 
lim sh{p{Q n )) = sh(p,(Q)) < oo. (26) For every « 6 M, we define a function f n : V — >• K by 



n— >oc 



To calculate lim sh{[i{P t \ Q n )), we will use the f n (p) ■= M (p \ (J J^), for P eV. (37) 

Consequence of Lebesgue Theorem. We consider a sequence »=l 

of functions Clearly, 

, , > /n(P) < At(-P), for n e N (38) 

f n :V3P ^ sh(n(P \ Q n )) e R, for n e N. (27) JnV 7 " PV 7 

and 

Let us observe that the Shannon function sh is increasing £t(P) < 1 (39) 

on [0, 2 — and decreasing on (2~t^2 } l]. Thus for a certain PeP 

TO ^ ^' To see that the sequence {f n }n°=i is pointwise convergent, 

sh((i(Pi \ Q n )) < 1, for i < m (28) we apply the indirect rea soning. Let P e V and let e > be 

j such that, for every neff, 

ah(ji(Pi \ Qn)) < sh(p(Pi)), for i > m, (29) 

for every neN. Since h{p~, V) < oo then 

oo oo We put n := \-~\ . We assume that we have already chosen sets 

Y i ah(jx{P i \Q n ))<m+ Y, *W0)<°°- (30) Ru _^ Rn e ft. Since p(P \ \J B4) > s then p(Ri) > e, 



f n (P) = p(P\\jR i )>£. (40) 



i=l 

for every i < n. Hence, we have 



i—l i=m-\-l 

Moreover, 

n n 

Km ah(p(P\Q n )) = ah(n{P\Q)), (31) m(IJ Ri) = > ne > 1, (41) 

™^°° i=i »=i 

for every P eV. as 7?. is a family of pairwise disjoint sets. Consequently, 

As the sequence of functions {sh(/j,(P\Q n ))} ne ^ satisfies „ 
the assumptions of the Consequence of Lebesgue Theorem /l*(P \ M R t ) < 0, (42) 

then, we get i=1 

00 00 which is the contradiction. The sequence {f n }n°=i i s conver- 

lim V sh(p(Pi \ Q n )) = V lim sh(p(Pi \ Q n )) (32) gen t. 

i=1 i=1 Finally, making use of Lebesgue Theorem, we obtain 

00 00 n 

= y sh{n{Pi \ Q)) < 00. (33) n{X \ I) Ri) = lim p(X \ I) Pi) (43) 



lim V fi{P \ M Ri) = V lim fi(P \ I) R t ) (44) 

n— »oc ' ^ N — ' ' n— >oo N — ' 

P^-p z__l Pe"P i=l 

oo 



(45) 



PGV z=l 



Theorem II.2. Let Q be an error-control family on X and 
let V be a Q-acceptable partition of X. Partition 1Z built by 
Algorithm II. 1 satisfies: 



h(p,U) < h(ji;V). 



(46) 



Proof: If h(fi; V) = oo then the inequality d46b is straight- 
forward. Thus let us discuss the case when V) < oo. 

We denote V = {Pi}°Z 1 , since at most countable number 
of elements of partition can have positive measure (the case 
when V is finite follows similarly). We will use the notation 
introduced in Algorithm EL 1. 

Directly from Lemma III. 11 we obtain 



h(fi;V Xk ) > h(»;Vx h+1 U {R k }), for k e N. (47) 
Consequently, for every k G N, we get 

k k+1 

h(p; \J{Ri} UV Xh )> h(n; |J {Ri} U V Xk+1 ). (48) 

i=l i=l 

Our goal is to show that 

k 

h{ti;{J{Ri}UVx h )>h(fi;n), (49) 
i=i 

for every k G N. 

Making use of d48l . we have 

fc 

h(^\J{Ri}UVx k ) (50) 

i=l 

k oo k 

= J2 shQt{Ri)) + J2 shQi(P t \ (J ^)) < 51 > 



i=l 



i=i 



> lim [V sh{n{Ri)) + V shi^Pi \ (J Rj))], (52) 

n— >oo * — ' z — * 

for every k G N. 

oo n 

We will calculate lim Yl sh(fi(Pi \ [J Rj)) using the 

Consequence of Lebesgue Theorem for a sequence of func- 
tions {/ n }^Li, defined by 

to 

/„:?3F4 s%(P \ |J flj-)) G K, for n G N. (53) 

3=1 



Similarly to the proof of Lemma III. II we may assume that 
there exists m G N such that 

n 

sh((j,(Pi \ (J Rj)) < 1, for i < m (54) 



and 

to 

sh(p,{Pi \ |J it,-)) < sh(n(Pi)), for i > m, (55) 

i=i 

for every ro G N. Moreover, 

n oo 

lim s/i(/x(P \ I J Rj)) = sh(n(P \ I J Rj)) = 0, (56) 

.7=1 3=1 

for every PeP since 7?. is a partition of X. 

Making use of the Consequence of Lebesgue Theorem, we 
get 

oo n oo oo 

„_™c £ s/i(m(Pj ^ U = E s/ ^(P \ (J R i)) = °- 
00 i=i i=i i=i i=i 

(57) 

Consequently, for every k G N, we have 

h(ti;\J{Ri}UVx k ) (58) 

2=1 

n oo n 

> lim [V + V sh{ti{Pi \ (J Rj))] (59) 



i=i 



^s/i( / x(i? i )) = /i(M;^), 



(60) 



i=i 



which completes the proof. ■ 

III. Concluding Remarks 

We have seen that in computing the entropy with respect 
to the error-control family Q it is sufficient to consider only 
partitions constructed from the sigma algebra generated by 
Q. Thus, we may rewritten the definition of the entropy with 
respect to Q: 

Corollary III.l. We have: 

H(n;Q) = M{h(n;V) € [0;oo] : 

V is a partition, V < Q and V C Sg}. 

(61) 

Let us observe that Algorithm II. 1 shows how to find a 
Q-acceptable partition with the entropy arbitrarily close to 

H(p;Q): 

Corollary III.2. Let Q be an error-control family of X. For 
any number e > 0, there exists partition f C Eg such that 



h{p,V)<H(p, Q) + e. 



(62) 



Proof: For simplicity let us assume that Q := {Qi}iZi 
(the case when Q is finite or uncountable follows in similar 
way). Then the partition V, which satisfies the assertion, is of 
the form: 



y-=\J{Q,(i)\\jQ,(k)h 

i—l k<i 

for specific permutation a of natural numbers. 



(63) 



When Q is a finite family then the entropy of Q is always 
attained on some partition V C Eg. More precisely, we have: 

Corollary III.3. Let Q be n element error-control family, 
where n G N. Then there exist sets Qi, . . ■ ,Qk G Q, for 
specific k < n, such that 

i 

H(»;Q) = h(»;{Q i \\jQ j }ti)- (64) 

i=i 

To see that the entropy with respect to arbitrary, possibly 
infinite, error-control family does not have to be attained on 
any partition, we use trivial example from [6. Example II. 1]: 

Example III.l. Let us consider the open segment (0, 1) 
with sigma algebra generated by all Borel subsets of (0, 1), 
Lebesgue measure A and error control family, defined by 

Q = {[a,b] :0<a<b< 1}. (65) 

One can verify that H(X; Q) = but clearly h{\; V) > 0, for 
every Q-acceptable partition V. 

As an open problem we leave the following question: 

Problem III.l. Let Q be an error-control family. We assume 
that if there exists {QijieN C Q such that Qk C Qk+i, for 
every k £ N, then also 1J Q G Q. We ask if the entropy 

QeQ 

with respect to Q is realized by some Q-acceptable partition 

~P C Eg. 
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